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Abstract 

A new bandwidth selection method for the fuzzy regression discontinuity esti¬ 
mator is proposed. The method chooses two bandwidths simultaneously, one for 
each side of the cut-off point by using a criterion based on the estimated asymp¬ 
totic mean square error taking into account a second-order bias term. A simulation 
study demonstrates the usefulness of the proposed method. 
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1 Introduction 


'he fu zzy regression discontinuity (FRD) estimator, developed bv iHahn. Todd, and Van der Klaauw 


(120011 ) (hereafter HTV), has found numerous empirical applications in economics. The 


target parameter in the FRD design is the ratio of the difference of two conditional 
mean functions, which is interpreted as the local average treatment effect. The most 
frequently used e stimation method is the nonparam etric method using the local linear 
regression (LLR). Imbens and Kalvanaraman ( 2012!) (hereafter IK) propose a bandwidth 
selection method specihcally aimed at the FRD estimator, which uses a single bandwidth 
to estimate all conditional mean functions. 

This paper proposes to choose two bandwidths simultan eously, one for each side o f 
the cut-off point. In the context of the sharp RD (SRD) design. lArai and Ichimural (120151) 
(hereafter AI) show that the simultaneous selection method is theoretically superior to 
the existing methods and their extensive simulation experiments verify the theoretical 
predictions. We extend their approach to the FRD estimator. A simulation study 
illustrates the potential usefulness of the proposed method^ 


2 Bandwidth Selection of The FRD Estimator 

For individual i potential outcomes with and without treatment are denoted by Ki(l) and 
Yi{0), respectively. Let Di be a binary variable that stands for the treatment status, 0 or 
1. Then the observed outcome, Kj, is described as Kj = DiYi{l) + {1 — Di)Yi{0). Through¬ 
out the paper, we assume that {Yi^ Di^ Xi), ..., {Yn, Dn, Xn) are i.i.d. observations and 
Xi has the Lebesgue density /. 

To dehne the parameter of interest for the FRD design, denote my+(a;) = E{Yi\Xi = 
x) andm£)+(a:) = E{Di\Xi = x) fora: > c. Suppose that lim2,\^c^y+(3^) and lima,\^c’^D+(a^) 
exist and they are denoted by my+(c) and m£)+(c), respectively. We dehne my_(c) and 
mD-{c) similarly. The conditional variances and covariance, cryj(c) > 0, > 0, 

aYDj{c), and the second and third derivatives myj(c), m^](c), m^](c), for 

^Matlab and Stata codes to implement the proposed method are available at 
http://www3.grips.ac.jp/~yarai/. 
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j = , are defined in the same manner. We assume all the limits exist and are 

bounded above. 


In the FRD design, the treatment status depends on the assignment variable Xj in 


a stochastic manner and the propensity score function is known to have a discon 
at the cut -off point c, implyi n g mn^(c) ^ r rin-ic). Under the conditions of HTV, 


inuity 


Porter 


fl2003l) or iDong and Lewbell fiforthcomind) . the LATE at the cut-off point is given by 
r(c) = (my+(c) — my_(c))/(m£)+(c) — mD-{c)). This implies that estimation of r(c) 
reduces to estimating the four conditional mean functions nonparametrically and the 
most popul ar method is the LLR because of its automatic boundary adaptive property 


([Fan, 


19921 1. 


Estimating the four conditional expectations, in principle, requires four band- 
widths. IK simplihes the choice by using a single bandwidth to estimate all functions 
as they do for the SRD design. For the SRD design, AI proposes to choose bandwidths, 
one for each side of the cut-off point because the curvatures of the conditional mean 
functions and the sample sizes on the left and the right of the cut-off point may differ 
signihcantly. We use the same idea here, but take into account the bias and variance due 
to estimation of the denominator as well. For simplihcation, we propose to choose one 
bandwidth, to estimate my+(c) and m£)_|_(c) and another bandwidth, h_, to estimate 
mY-{c) and mD-{c) because it is also reasonable to use the same group on each side. 


2.1 Optimal Bandwidths Selection for the FRD Estimator 

We consider the estimator of r(c), denoted 'f(c), based on the LLR estimators of the four 
unknown conditional mean functions. We propose to choose two bandwidths simulta¬ 
neously based on an asymptotic approximation of the mean squared error (AMSE). To 
obtain the AMSE, we assume the following: 

ASSUMPTION 1 (i) (Kernel) K{-) : M —)• M is a symmetric second-order kernel func¬ 
tion that is continuous with compact support; (ii) (Bandwidth) The positive sequence of 
bandwidths is such that hj —)■ 0 and nhj oo as n ^ oo for j = -1-, —. 
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Let T> be an open set in M, A; be a nonnegative integer, Ck be the family of k times 
continnonsly differentiable fnnctions on T> and be the kth derivative of g{-) G Ck- 

Let Qk{TC>) be the collection of fnnctions g snch that g E Ck and — g^^\y)\ < 

Mk \x — 1 /|“, x,y,z E V, for some positive Mk and some a snch that 0 < a < 1. 

ASSUMPTION 2 The density of X, f, which is bounded above and strictly positive at 
c, is an element of Qi{V) where V is an open neighborhood of c. 

ASSUMPTION 3 Let d be some positive constant. The mY+, <7 y+ crYD+ o,re ele¬ 
ments of Q^ifDi), GoiVi) and respectively, where Vi is a one-sided open neigh¬ 

borhood of c, (c, c + 5). Analogous conditions hold for mY-, o'y_ andaYo- onV^ where 
Vq is a one-sided open neighborhood of c, (c — S,c). 

The following approximation holds for the MSE nnder the conditions stated above. 


LEMMA I Suppose AssumptionslMB hold. Then, it follows that 


MSEn{h+,h_) = 


iMc)y 


+ 


0 i(c)h^ - (j)o{c)hf + 'ipi{c)hy - V’o(c)hi + o + ht) | 

( 1 ) 


Uijc) ^ a;o(c) ] ^ ^ ^ 


^/( c )( t _ d ( c ))2 \ hy h- j ynhy nh_ 


where, for j = +, — and k = Y,D, td{c) = mD+(c) — mD-{c), u!j{c) = o'Yj{c) + 
^(c)V|)^.(c)-2r(c)cTyDi(c), (f)j{c) = Cl mPj{c) -r(c)mS^](c) , '0j(c) = Cyi(c)-r(c)CDi(c), 


Ckj{c) = (-j) < 6 




,(3)/ 


2 /(c) 


6 


2 /(c) 


6 


Cl = (hi - hlh3)/2(/i0h2 - hi); V = (p^I/Q-2/11/12^^1 +hl^^2)/(h0h2-hl)^ /l = (h2h3- 
hlh4)/(h0h2 - h?); 6 = (hi - hlh3) (h0h3 “ hlh2) /(hoh2 “ hi)^ h/ = J^U^K{u)du, 

Uj = K‘^(u)du. 
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A standard approach applied in this context is to minimize the following AMSE, 
ignoring higher order terms: 


AMSE{h+,h.)= I 

{w)r 




V 

^ nf{c){TD{c)y 


UJl{c) Wo(c)) 

h+ ^ h_ s 


( 2 ) 

As AI observed, (i) while the optimal bandwidths that minimize the AMSE (|5]) are well- 
dehned when 0+(c) • 0-(c) < 0, they are not well-dehned when 0+(c) • 0-(c) > 0 because 
the bias term can be removed by a suitable choice of bandwidths and the bias-variance 
trade-off breaks down]^ (ii) When the trade-off breaks down, a new optimality criterion 
becomes necessary in order to take higher-order bias terms into consideration. We dehne 
the asymptotically hrst-order optimal (AFO) bandwidths, following AI. 


DEFINITION 1 The AFO bandwidths for the FRD estimator minimize the AMSE 
dehned by 


AMSEin{h+,hJ) 


1 




2 y 


nf{c){TD{c)y 


[ ^+(c) UJ-{c) 
\ h+ h_ 




when 0+(c) • 0-(c) < 0. When 0+(c) • 0-(c) > 0, the AFO bandwidths for the FRD 
estimator minimize the AMSE dehned by 


AMSE2n{h+,h_) 


1 

(^d(c)) 





V f uj+{c) a;-(c) ) 

nf{c){TD{c)y \ h+ J 


subject to the restriction 0+(c)h^ — (j)-{c)hf_ = 0 under the assumption of 'ip+{c) — 
{0+(c)/0-(c)p/V-(c)^O. 


When 0+(c) • 0_(c) < 0, the AFO bandwidths minimize the standard AMSE ([2]). When 
0+(c) • 4>-{c) > 0, the AFO bandwidths minimize the sum of the squared second-order 
bias term and the variance term under the restriction that the hrst-order bias term be 
removed. Inspecting the objective function, the resulting AFO bandwidths are 
when 0+(c) • (j)-{c) < 0 and when 0+(c) • > oE 

^This is the reason why IK proceed with assuming = /i_. 

^The explicit expression of the AFO bandwidths are provided in the Supplemental Material. 
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When 0+(c) • 0-(c) > 0, Definition 1 shows that MSEn is of order 
which implies that the MSE based on the AFO bandwidths converges to zero faster than 
the rate attained by the single bandwidth approaches such as the IK method. 
When 0+(c) • 0-(c) < 0, they are of the same order. However, it can be shown that the 
ratio of the AMSE based on the AFO bandwidths to that based on IK never exceeds 
one asymptotically (see Section 2.2 of AI). 


2.2 Feasible Automatic Bandwidth Choice 

The feasible bandwidths is based on a modified version of the estimated AMSE (MMSE) 
as in AI. It is defined by 


MMSEP{h+,h_) 




+ {Mc)hl - 


V 

^ nf{c) 


p+(c) D-(c) ) 

{ h, h_ j 


( 3 ) 


where 0j(c), 'ipj{c), u!j{c) and /(c) are consistent estimators of 0j(c), 'ipj{c), 0 Jj{c) and 
f{x) for / = , respectively. A key characteristic of the MMSE is that one does not 

need to know the sign of the product of the second derivatives a priori and that there is 
no need to solve the constrained minimization problem. 

Let (h+,h_) be a combination of bandwidths that minimizes the MMSE given 
in (|3]). The next theorem shows that (/i+,h_) is asymptotically as good as the AFO 
bandwidths. 


THEOREM 1 Suppose that the conditions stated in Lemma [3 hold. Assume further 
that (j)j{c) -)■ (j)j{c), /(c) ^ /(c) and u:j{c) -)■ cn^/c) for j = 

respectively. Also assume /’+(c) — {(/+(c)/(/_(c)}^/^/^_(c) ^ 0. Then, the following hold. 


h\ ' 


1 , 



and 

MSEn{hl,hl) 


where (h+,hL) are the AFO bandwidths. 

The first part of Theorem [T] shows that the bandwidths based on the plug-in version of 
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the MMSE are asymptotically equivalent to the AFO bandwidths and the second part 
exhibits that the minimum value of the MMSE is asymptotically the same as the MSE 
evaluated at the AFO bandwidths. Theorem [T] shows that the bandwidths based on the 
MMSE possess the desired asymptotic properties. Theorem [T] calls for pilot estimates 
for 0j(c), 'ipjic), /(c) and ujj{c) for j = +, —. A detailed procedure about how to obtain 
the pilot estimates is given in the Supplemental Material. 


3 Simulation 


We conduct simulation experiments that illustrate the advantage of the proposed method 
and a potential gain for using bandwidths tailored to the FRD design over the bandwidths 
tailored to the SRD design. Application of a bandwidth developed for the SRD to the 
FRD context seems common in practiceQ 

Simulation designs are as follows. For the treatment probability, E[D\X = x\ = 
/(!gpexp [—]du for a: > 0 and exp [—]du for a: < 0. This leads 

to the discontinuity size of 0.8. The graph is depicted in Figure l-(a). 

For the conditional expectation functions of the observed outcome, E\Y\X = x], 
we consider two designs, which are essentially the same as Designs 2 and 4 of AI for the 
the SRD design. The specihcation for the assignment variable and the additive error 
are exactly the same as that considered by IK^ They are depicted in Figures l-(b) and 
l-(c). We use data sets of 500 observations with 10,000 repetitions. 

The results are presented in Table [T] and Figure [2l Four bandwidth selection 
methods, MMSE-f, MMSE-s, IK-f and IK-s, are examined. MMSE-f is the new method 
proposed in the paper and MMSE-s is the one proposed for the SRD design by AI. IK-f 
and IK-s are the methods proposed by IK for the FRD and SRD designs, respectively. 
The bandwidths for MMSE-s and IK-s are computed based only on the numerator of the 
FRD estimator and the same bandwidths are used to estimate the denominator. Table 


"‘For example, Imbens and Kalvanaramiu] ( 2012 . Section 5.1) state that the bandwidth choice for 
the FRD estimator is often similar to the choice for the SRD estimator of only the numerator of the 
FRD estimand. 

®The exact functional form of E\Y\X = x], the specification for the assignment variable and the 
additive error are provided in the Supplemental Material 
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(a) Treatment Probability, (b) Design 1. Ludwig and (c) Design 2. Ludwig and 
E[D\X = x]. Miller Data I Miller Data II 


Figure 1: Simulation Designs, (a) Treatment probability, (b) E\Y\X = x] for Design 1, 
(c) E\Y\X = x] for Design 2. 


Table 1: Bias and RMSE for the FRDD, n=500 


Design 

Method 

k 

Mean 

+ 

SD 

h. 

Mean 

SD 

Bias 

f 

RMSE 

Efficiency 

1 

MMSE-f 

0.056 

0.033 

0.097 

0.100 

0.037 

0.168 

1 


IK-f 

0.177 

0.033 



0.180 

0.192 

0.876 


MMSE-s 

0.235 

0.109 

0.489 

0.259 

0.339 

0.373 

0.450 


IK-s 

0.325 

0.068 



0.443 

0.451 

0.373 

2 

MMSE-f 

0.226 

0.091 

0.624 

0.214 

-0.002 

0.072 

1 


IK-f 

0.284 

0.052 



0.087 

0.097 

0.739 


MMSE-s 

0.227 

0.092 

0.628 

0.215 

-0.002 

0.072 

1 


IK-s 

0.337 

0.072 



0.093 

0.101 

0.709 


[T] reports the mean and standard deviation of the bandwidths, the bias and root mean 
squared error (RMSE) for the FRD estimates, and the relative efficiency based on the 
RMSeI^ Figure 2 shows the simulated CDF for the distance of the FRD estimate from 
the true value. 

Examining Table 1 and Figure 2-(a), for Design 1, MMSE-f performs signihcantly 
better than all other methods. For Design 2, Table 1 and Figure 2-(b) indicate that 
MMSE-f and MMSE-s performs comparably but clearly dominate IK methods currently 
widely used. In cases we examined, the new method performs better than currently 
available methods and using methods specihcally developed for the FRD dominates the 
method developed for the SRD. 


®The bias and RMSE are 5% trimmed versions since unconditional finite sample variance is infinite. 

















(a) Design 1. Ludwig and Miller Data I (b) Design 2. Ludwig and Miller Data II 
Figure 2: Simulated CDF of |f — r| for different bandwidth selection rules 
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Supplemental Material 


This Supplemental Material provides an explicit expression of the AFO bandwidths, 
details about our simulation experiment, a sketch of the proof for Lemma 1 and proce¬ 
dures to obtain pilot estimates. 


A Definition of the AFO Bandwidths 

DEFINITION 2 The AFO bandwidths for the fuzzy RD estimator minimize the AMSE 
dehned by 


AMSEUh) = , , ,,, 


(j)-{c)hfi 


V f u;+(c) cu_(c) ) 

nf(c)(Tn(c)Y I h+ h- j ' 


when 0+(c) • 0-(c) < 0. Their explicit expressions are given by h\. = 9*n and 
h*_ = X*h\, where 


0 * = 


vu+{c) 


1/5 


4/(c)0+(c) [0+(c) - A*V-(c) 


and A* 


/ 0+(c)cu-(c) )^/^ 

1 0-(cV+(c)j ■ 


When 0+(c) • 0-(c) > 0, the AFO bandwidths for the fuzzy RD estimator minimize the 
AMSE dehned by 


AMSE2n{h) = . . ... 

{w)r 


ip-{c) hf_ 


V / cu+(c) uj-ic) \ 

nf{c){TD{c)Y 1 /i+ h_ j 


subject to the restriction — (l)_{c)hf_ = 0 under the assumption of t/’+(c) — 

{(;/)+(c)/0_(c)}^/^i/'_(c) Y 0- Their explicit expressions are given by h*f = 0**n~^l’^ and 
hE = A”/i”, where 
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6 /(c) [Y+{c) - A**V_(c)] J 


0-(c)j 
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B Simulation Designs 

Let li{x) = E\Y (1)|X = x] and Iq = E\Y (0)|X = x\. A functional form for each design 
is given as follows: 

(a) Design 1 

{ a. + 18.49a; - Sd.Sa;^ + 74.3a;3 - 45.02a;^ + 9.83a;5 if a; > 0, 
lj{x) = I 

y aj + 2.99a; + 3.28a;^ + 1.45a;^ + 0.22a;"^ + 0.03a;® if a; < 0, 

where (q;i,q;o) = (—0.17,4.13). 

(b) Design 2 

f a,-+ 5.76a; - 42.56a;2 + 120.90a;® - 139.71a;^ + 55.59a;® if a; > 0, 
ij{x) = I 

y aj - 2.26a; - 13.14a;2 - 30.89a;® - 31.98a;^ - 12.1a;® if a; < 0, 
where (q;i,q;o) = (0.0975,0.0225). 

The assignment variable Xj is given by 2Zj — 1 for each design where Zi have a 
Beta distribution with parameters a = 2 and /9 = 4. We consider a normally distributed 
additive error term with mean zero and standard deviation 0.1295 for the outcome equa¬ 
tion. 


C Proofs 


Proof of Lemma 1: As in the proof of Lemma A2 of Calonico, Cattaneo, and Titiunik 
(2014), we utilize the following expansion 


rv(c 

^n>(c 


W(c) ^_ 

td(c) td{c) 
r(c) 


(fy(c) - ty{c)) - {foie) 


rD{c)fD{c) 


(rD(c)-rD(c)) - 


^d(c) 

1 


TD{c)fD{c] 


(^r(c) 


^d(c)) 

-ry(c))(fD(c) 


rD(c)). (5) 


Since the treatment of the variance component is exactly the same as that by IK, we 
only discuss the bias component. Observe that Lemma 1 of Arai and Ichimura (2015) 
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implies the bias of Ty(c) and td(c) are equal to 


Cl 


"i-y+(c)h^ - + C,Yi{c)h\ - CYo{c)ht_ +o{h\ + ht_) 


and 


Cl 


m§^{c)h\ - rri^‘^_{c)h?_ + C,Di{c)h\ - C,Do{,c)hl +o{h\ + hV), 


respectively. Combining these with the expansion given by ([5]) produces the required 
result. 


D Procedures to Obtain Pilot Estimates 

Procedures to obtain pilot estimates for myj(c), myj(c), /(c), /*'^^(c), and cryj(c), for 
j = , are exactly the same as those for the sharp RD design by AI (see Appendix 

A of AI). Pilot estimates for m^^(c), m^^(c), a^^/c), for j = , can be obtained 

by replacing the role of R by D in Step 2 and 3 of Appendix A of AI. Pilot estimates 
for cTyDj(c), i = +, — are obtained analogously to Cy^/c). We obtain a pilot estimate 
for td{c) by applying the sharp RD framework with the outcome variable D and the 
assignment variable X. 
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